26 research outputs found
Controllable Multi-domain Semantic Artwork Synthesis
We present a novel framework for multi-domain synthesis of artwork from
semantic layouts. One of the main limitations of this challenging task is the
lack of publicly available segmentation datasets for art synthesis. To address
this problem, we propose a dataset, which we call ArtSem, that contains 40,000
images of artwork from 4 different domains with their corresponding semantic
label maps. We generate the dataset by first extracting semantic maps from
landscape photography and then propose a conditional Generative Adversarial
Network (GAN)-based approach to generate high-quality artwork from the semantic
maps without necessitating paired training data. Furthermore, we propose an
artwork synthesis model that uses domain-dependent variational encoders for
high-quality multi-domain synthesis. The model is improved and complemented
with a simple but effective normalization method, based on normalizing both the
semantic and style jointly, which we call Spatially STyle-Adaptive
Normalization (SSTAN). In contrast to previous methods that only take semantic
layout as input, our model is able to learn a joint representation of both
style and semantic information, which leads to better generation quality for
synthesizing artistic images. Results indicate that our model learns to
separate the domains in the latent space, and thus, by identifying the
hyperplanes that separate the different domains, we can also perform
fine-grained control of the synthesized artwork. By combining our proposed
dataset and approach, we are able to generate user-controllable artwork that is
of higher quality than existingComment: 15 pages, accepted by CVMJ, to appea
Generative Colorization of Structured Mobile Web Pages
Color is a critical design factor for web pages, affecting important factors
such as viewer emotions and the overall trust and satisfaction of a website.
Effective coloring requires design knowledge and expertise, but if this process
could be automated through data-driven modeling, efficient exploration and
alternative workflows would be possible. However, this direction remains
underexplored due to the lack of a formalization of the web page colorization
problem, datasets, and evaluation protocols. In this work, we propose a new
dataset consisting of e-commerce mobile web pages in a tractable format, which
are created by simplifying the pages and extracting canonical color styles with
a common web browser. The web page colorization problem is then formalized as a
task of estimating plausible color styles for a given web page content with a
given hierarchical structure of the elements. We present several
Transformer-based methods that are adapted to this task by prepending
structural message passing to capture hierarchical relationships between
elements. Experimental results, including a quantitative evaluation designed
for this task, demonstrate the advantages of our methods over statistical and
image colorization methods. The code is available at
https://github.com/CyberAgentAILab/webcolor.Comment: Accepted to WACV 202
Towards Flexible Multi-modal Document Models
Creative workflows for generating graphical documents involve complex
inter-related tasks, such as aligning elements, choosing appropriate fonts, or
employing aesthetically harmonious colors. In this work, we attempt at building
a holistic model that can jointly solve many different design tasks. Our model,
which we denote by FlexDM, treats vector graphic documents as a set of
multi-modal elements, and learns to predict masked fields such as element type,
position, styling attributes, image, or text, using a unified architecture.
Through the use of explicit multi-task learning and in-domain pre-training, our
model can better capture the multi-modal relationships among the different
document fields. Experimental results corroborate that our single FlexDM is
able to successfully solve a multitude of different design tasks, while
achieving performance that is competitive with task-specific and costly
baselines.Comment: To be published in CVPR2023 (highlight), project page:
https://cyberagentailab.github.io/flex-d
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
Controllable layout generation aims at synthesizing plausible arrangement of
element bounding boxes with optional constraints, such as type or position of a
specific element. In this work, we try to solve a broad range of layout
generation tasks in a single model that is based on discrete state-space
diffusion models. Our model, named LayoutDM, naturally handles the structured
layout data in the discrete representation and learns to progressively infer a
noiseless layout from the initial input, where we model the layout corruption
process by modality-wise discrete diffusion. For conditional generation, we
propose to inject layout constraints in the form of masking or logit adjustment
during inference. We show in the experiments that our LayoutDM successfully
generates high-quality layouts and outperforms both task-specific and
task-agnostic baselines on several layout tasks.Comment: To be published in CVPR2023, project page:
https://cyberagentailab.github.io/layout-dm
Diffusion-based Holistic Texture Rectification and Synthesis
We present a novel framework for rectifying occlusions and distortions in
degraded texture samples from natural images. Traditional texture synthesis
approaches focus on generating textures from pristine samples, which
necessitate meticulous preparation by humans and are often unattainable in most
natural images. These challenges stem from the frequent occlusions and
distortions of texture samples in natural images due to obstructions and
variations in object surface geometry. To address these issues, we propose a
framework that synthesizes holistic textures from degraded samples in natural
images, extending the applicability of exemplar-based texture synthesis
techniques. Our framework utilizes a conditional Latent Diffusion Model (LDM)
with a novel occlusion-aware latent transformer. This latent transformer not
only effectively encodes texture features from partially-observed samples
necessary for the generation process of the LDM, but also explicitly captures
long-range dependencies in samples with large occlusions. To train our model,
we introduce a method for generating synthetic data by applying geometric
transformations and free-form mask generation to clean textures. Experimental
results demonstrate that our framework significantly outperforms existing
methods both quantitatively and quantitatively. Furthermore, we conduct
comprehensive ablation studies to validate the different components of our
proposed framework. Results are corroborated by a perceptual user study which
highlights the efficiency of our proposed approach.Comment: SIGGRAPH Asia 2023 Conference Pape